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The problem with face recognition systems today is that they are not able to recognize and 
learn new faces automatically while in operation. 

c The measures/device features that are proposed to solve the problem, and the resulting 
advantages. If the invention is based on a new understanding (insight), please indicate 
this. 

The proposed system in this invention automatically adds new faces to an existing database 
and keeps learning new faces. So. In that sense, in contrast to existing systems our online 
learning system can learn features of new faces and store corresponding models for new 
faces. This is especially important in the area of new digital cameras and PDAs and portable 
storage containers with imaging capability. 

d Provide at least one embodiment of the invention, where you explain the best way of 
carrying out the invention. Please add drawings, graphs, test data etc. where appropriate. 

Summary . ^ • * 

The main idea is to recognize known faces, detect unknown faces and apply automatic 
online learning for unknown faces in videos. After the online learning, our Classifier could 
recognize the new (unknown) faces presented before. After the recognition, the Classifier will 
assign recognized face IDs to the faces. 

^e^'faces using for classifier training and online learning could be found on Intemet sources or 
screenshot from the video. After our classifier detect a new face, the database will be 
updated vnHh the new faces. 

Most face recognition system will only recognize a fixed number of faces in the database and 
the face database cannot be updated during the classifying procedure. Our approach could 
automatically detect the new faces and extend the database based on the new faces. Also, 
our approach could generate a confidence measurement for each recognized face in the 
database and sort the candidate by the confidence measurement, which make post- 
processing easier. 

ModHied Probabilistic Neural Networks 

Probabilistic Neural Networks are used in the standard machine learning literature. The 
purpose of the modification on Probabilistic Neural Networks is to detect an unknown pattern 
and to do online learning for the unknown pattern so that the unknown pattern could be 
recognized next time by our networks. In order to detect the unknown pattern, a threshold is 
set on the category layer of Probabilistic Neural Networks. The threshold could be 
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without «l«e written pemilsslon of KonlnUUko Philips Eledronlc8N.V. who to ^ |.U.;U-.!I I l-l II ku; 
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comparative or absolute value depending on how the system is designed. If the o*J*Put value 
from me Sen layer is lower than the threshold, the Classifier will recogn ze the Inpi^ P°"^"^ 
Sunknown pattern. After It detects an unknown pattern, the Classifier will store the unknown 
pattern information In hidden layer. Therefore, when the unknown pattern appears next time, 
the Classifier could recognize this unknown pattern. 

System Diagram 



Video Inputs 



Samole 



[I 



Face Detection 



a 



Classifier Training \ 



Face Classifier 




A 




Online Training 



fn our^mDteme^^^^ we choose Vector Quantization (VQ) Histogram feature for the 

Cla^s f'.Tr ^^^we^^^^ another feature space that is used for face recogn.t.on .n the 

merat^" leTs^^^^^^^ The VQ Hfetogram calculation procedure is showed below: 



Low-pass Filter 
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We forward the face image into a low-pass filter first. The low-pass filter Is used for reducing the 
high-frequency noise and extracting the most effective low frequency component for 
recognition. We then divide the image into 4-by-4 block. Next, calculate the minimum intensity 
in each 4-by-4 pixel block, and subtract the minimum intensity for each block. Therefore, we 
could get an intensity variation in each block. Then, for each block division from the face 
image, we match the block with all the codes in the codebook, and the most similar-matched 
codevector is selected. Euclidean distance is used for the distance matching. After 
performing VQ for all the blocks divided from a facial image, matched frequencies for each 
codevector are counted and a hustogram is generated. 

The figure on the rinh t showed below is the sample of VQ Histo gram for the face on the left. 



./fi 




The VQ Histogram is insensitive to geometry information of the face, robust to lighting, posing 
and expression and also independent of face position if the background is uniform. 

Probabilistic l^eural Networks Structure 

The standard Probabilistic Neural Networks contains three layers: Input layer, pattern layer and 
category layer. The Input Layer v/ill normalize the input vectors and forward the normalized 
data to Pattern Layer. Pattern layer will then odd one node In pattern layer for each training 
and save the weights as normalized input vectors. The category layer will choose the 
maximum value calculated from Pattern Layer and calculate the confidence measurement 
based on the values from pattern layer. The Probabilistic Neural Networks diagram is showed 
below: 



Confidential. No disclosure of the contents to persons outside Philips Is allowed 
without the written pennlsslon of KonlnklUke Philips EleclroniGS M.V. who Is the 
owner of this Information. 
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Input Layer 



Pattern Layer 



Category Layer 



Probabilistic Neural Network Training . . ^ * ..«^w tko 

T^e pro^^^^^^^^^ Neural Networks will normalize all the data from input vector. The 
normalization calculation Is shown below: 



V = 



Z'. 



Xn is the input vector and X'n is the normalized input vector. 

the supervised category. 
ClassifylngProce^^^^^^^^^ 

Swed ofpSS^^^^^^^ SS^fHS^TT^en in Patte-^^^^^^^^ --<^'^-^ 
rnput Swill perform a dot product to the saved weights as showed below. 

Z,=X'W, 

The X is the normalized input vector and W, is the weight. Z, is the dot product value for each 
node in pattern layer. 

Then do the following calculation: 

without til© written pormlsslon of KonlnkUJko Philips Electronics N.v. wno w ™ 
owner of this Information. 



PHILIPS 



.1^-,^ u, , • leo-ro «-^— iB^w im^ne Database on 01/27/2005 



Philips intenectual Property & Standards 

Invention Disclosure This template Is meant for making a detailed description of 

A n 90f« your invention off-line. The description can then later be 

Version 1 AO 14-1 attached in electronic form to your ID submission. You must 

also use this off-line description to obtain export clearance 

from your local export-control of fleer. 

The als the smooth factor and Z'l is the output value for each node In pattern layer. 

In category layer. The Probabilistic Neural Networlcs will compare the output value from 
pattern layer and calculate the confidence measurement based on the value from pattern 
layer. The nodes in category layer will do the following calculation: 

The Ci is the confidence measurement for each category, n,- is number of pattern nodes for / 
category. 

Online Training & Detecting new faces i„„4,«„i. 

We use a fuzzy detemiinatlon to detect the new coming faces. The fuzzy determination is 
performed using the following rules: 

1 Output value from pattern layer is below threshold 

2. Mean of output value from pattern layer during a time series is low 

3. Distance to other dusters Is stable 

4. The face lasted for a particular time slice . ^ . w« +k« 
After we detect a new coming face, we will do the online training in order to store the 
Information of the new face. The online training will do: 

1. Add new nodes in hidden layer for each new face _ 

2. Do the normalization for the new face's input vector and save them as weights 
3 Link the new nodes to a new category in category layer 

The Information of new face will be stored in hidden layer and category layer of Probabilistic 
Neural Networks. Therefore, when the "new face" appear again, the probabilistic Neural 
Networks will recognize this face as known face. 

Implementation ... jt> i 

There are several possible implementations as descnbed below 

"Actor/ Actress recognition . , . . . 

Our face recognition system could recognize the faces in the mov.es. home videos a^^^^^ 

business videos with a pre-trained Probabilistic Neural Networks. For the face recogr^ition in the 

movies, we could find all of the player names from the screenplay or other sources (e.g. 

IMDB.com). Search all of the players' faces on internet, and train the Probabllishc Neurol 

iSSks based on the images found on the Internet. After the training, the Probabiletic Neural 

Networks is ready for classifying then. 

"Automatic editor for home movies 

In this scenario 
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Our user wants to see the movie clip only when some specltic persons appear in the video. 
Applying our face recognition system could solve this problem. The system will recognize faces 
in the movie and extract the clip based on the recognized faces. If the recognized faces are 
the faces that the user wants to see, then we will show user this clip. 
•Role Analysis ^. * u 

Apply face recognition system for the whole video and count the appearance time for each 
recognized face. The face got longest appearance time is the main actor/actress of the 

video. ^ ^ ... 

oAutomatIc Photo Distribution: Automatic picture e-mailing system based on face recognition. 



•Search Photo based on the wanted people 

The face recognition system could also be applied in images. For example, our user have a 
huge digital image library, and he/she want to look at his/her grandma's pictures. Our face 
recognition system could help user to do the time consuming search. After we trained the 
Probabilistic Neural Networks based on the persons we want to recognize, we apply the 
Probabilistic Neural Networks to all the images in the library. Our system will automatically 
recognize his/her grandma and show the Images to user. 

•Meeting Summarization x , i.j 

Combining with information from other sources (e.g. Audio, Meeting notes), we could 
generate a meeting summarization associated with the face ID. meeting notes and time 
stamp. 



e Indicate in which fields (technical, commercial) the invention can be applied. 

Digital cameras. PDA& mobile phones with cameras, home media server. DVD+RW combi 

product. 
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Abstract 

We developed an online-learning face recognition 
system based on the Modified Probabilistic Neural 
Networks (MPNN) for videos. This face recognition 
system can detect and recog^ face, as well as 
automatically detect unknown faces and train the 
unknown faces online into face Classifier so that this 
*imknown face" can be recognized if it appears 
again. The MPNN is implemented by setting threshold 
on the category Ic^er of Probabilistic Neural 
Networks (PNN) in order to detect unknown category 
of input data. Following ^mefiaxy rules based on the 
^tected unknown category from MPNN, the system 
could then detect Oie unknown faces in videos. The 
PNN feedforward training makes the online training 
very fast without charing the wei^ts between the 
trained fijces, 

1. IntroductioD 

Digital images and digital video are becoming 
more and more uWqiiitous cunenlly for both home 
useis and business useis. Most of these sm and 
moving images contain people and their activities. As 
the number of these images e3q)lodes, it beomies a 
smvival task to access these images based on who is 
picsaiL Facial analysis has been a veiy active research 

area recently. Face recognition fcr feces that are 
known in advance exists. The biggest weakness of fece 
leoognition systems today is that they are not able to 
lecognize and learn new feces antomalically ^iule in 
operation. ^ 

Most fece recognition systems are tramed on a 
fixed number of feces which are known in advance. 
These ^ems will onty recognize the feces with 
known models and the fece database cannot be 
vpdated during the dassification procedure. In this 



respect these systems are very limited once tbsy are 
placed in operation, Th^ wiU work for surveillance 
systems, which have to recognize aU en^Ioyees of a 
conq>any and ailarl to any intniders. However, in the 
area of home video. TV broadcast video, wearable 
video, there are new people appearing as the story 
unfolds. If a system is trained to recognize only femily 
members then a visitor is labeled as **other^ or 
^unknown". Of course there are tiavd videos with 
many new feces fliat are transient A system that 
categorizes images and videos based on people 
presence has to distinguish aU these categories of 
important and unimportant feces. Moreover, the 
has to be flexible enoi^gh to incorporate and 
retain important feces. 

Our approach can automatically detect the new 
feces and extend the database based on the new feces. 
Our online learning system can learn features of new 
feces and store corresponding models of new feces for 
felore use. Also, our approach can generate a 
oonfidence measurem«it for each recognized fece in 
the database and sort the candidate by the confidence 
measoremoit, which make post-processiog easier. 

2. System Architecture 

Figure Ig ifiBre-4 shows our fece recognition qrstem | 
architecture. There are two approadies to bootstrap 
the^stem: l) Initial database bas a limited number of 
feces, and 2) Initial database is enq>ty. If the system is 
first tramed on the initial database, we gaui high 
recognition accuracy. Tliis method is sunilar to our 
human perception of known feces and incoiporation of 
new feces. We assume that we know some feces of the 
video dip before we perform the onUne learning for 
the whole video dip. The system has a training phase 
and dassification phase just like any other feoe 
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reoognition system. However, the important aspect 
here is that there is a ftedbadc amw to the training 
&oe fiir unknown fiioes. 



Video Frames 




Vw&m 1. Face Recospition System Arddtecture 
During the training phase, the system will read face 
examples for each fecc (actor/character) and train flie 
PiobabHistic Neural Netwoiks (PNN) [4][2] based on 
the features of these feces. The size of PNN wiU 
inciease during the traimqg and it is decided hy the 
fifllowing equation: 

where /li is the number of training feces for the 
person i and the ot is the number of persons in the 
initial database to be trained by PNN, 

During classification phase, the system win decode 
the MPEG video file into video fiamesfiisL Fweach 
fiame, we use a variant (rf the fece detector described 
in [8] If there is a fece found by the fece detector, the 
fiioe segment will be forwaided to the PNN based Face 
Classifier. If it is a known fece, the confidence 
measurement for each fece ID will be goierated by 
PNN- otherwise, the unknown fece will be evaluated 
and forward to online learning phase if necessaiy 
After we have the confidence measurement for each 
Face ID, we can easily <*oose the Face ID with the 
maximum confidence measurement as the ou^ut firom 
FaoeClas^er. 

3. Face Detection 

This section bricQy describes the fece detection 
algorithm used in our ftamewoik. In [8], Viola and 
Jones ^Ued die popular AdaBoc^ [121 learning 
technique to the preblwn of rapid object detectioa 



They used an attootional cascade of strong classifiers 
that consisted of a set of computationally efficient 
binaiy features (also caUed weak classifiers). Each 
lound t of boosting added a single feature ht to the 
cunent set of features by minimizing: 

2, «5^A(0exp(^a,M(^i))» 

where Dt(i) is the weight on exanq>le Xi at round t, yi e 
(-1, 11 is the target label of the example, Ot is the 
influence of this weak hypothesis on the strong 
classifier and h^O is the weak binaiy hypothesis 
restricted to [-1, ll- In our variant, we use boostmg 
stun^ (dedsiott trees that partition the domain into 
two pieces and yidd a prediction for each partition) as 
the weak dassifias and our goal is to now minimize: 

Z|=ZA(Oexp(->'A(^i))' 

^tibm ck has been folded into 1^. tlM»*y allowing the 
weak hypotheses to have a range over aU SR rather 
than the restricted range [-1. +11. Hie prediclum 
values for the Idt and right partitions that mininnze Z, 

aboveare: 

where the Ws denote the waght of the examples that 
are assigned to the left or right partition with tn» 
labels "positive- or "negative". The predictions are 
also smooflied with the term e to avoid numerical 
mobtems caused by large predictions. Typical^, e is 
chosen on the Older of the ledpiocal of tiie namber of 
training samples in our system. From these predirtion 
vahies, we can greedily choose tiie sjrtitting cntoton 
fijr the dedao n tree (dro pping flie subscript t) 
asZ = 2(V^r^ + .jw^^wr^) 
rather tiian the Gini index or an entropic fbnction 

1121- , .1. 

A few variants 19][U1 of ttie leammg algonttun 

described in [8] have been proposed recentiy. Tbe^ 
algorithms reduce the training error (i.c error in tte 
training srt) during training and count on Oie 
generalization peritamaiice of AdaBoost Uiat is 
rigorously proved in 112]. It is our experience that 
uang a vaUdation set during training as m [8][101 
vidds the most effective cascades. In addition, we just 
scan tiie vaBdation set once (rather than sevOTl times 
as in riO]) for eacA weak clasdfier thai is added to the 
cmrent cascade in order to adjust tiie stoong dasafier 
dueshidd. We do this by keqring trade of the 
lectandes and their corresponding last stage sums that 
pass through afl but the pemdtimate stage «rf tiw 
cunent cascade (fbr the first stage, this amounts to 
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keeping tiadc of aU rectangles scanned and thdr 
oonesponding sums). Our final trained cascades 
^ically have around 30 stages and the entire training 
process takes slightty less than a week to complete on 
adnal 2.8Ghz Xeon processor with 2GB of memoiy. 
We use around 4000 positive samples and 5000 
negative sanqiles for training each stage of the cascade 
where the negative samples for each sta^ are the felse 
positives obtained by scanning the conent cascade on 
an image set with no feces. Our validation s^ consists 

of around 200 feces. The fece detector runs 
oomfbitably in real-time on a 750Mhz P-3 kiptop and 
can detect feces at 10 different scales. 



4. Face Features 

Hus section introduces a Vector Quantization (VQ) 
Histogram based fece feiture 11], wMc4 we chose fo^ 
our Face Recogmtion system. Htowevw. one can use 
another feature qwce that is used for feoe recognition 
in the Uteralure (e.g. EigenFaces m). The VQ 
Histogram calculation procedure is showed in Egaic 



into 4-ty-4 block. Next, we calculate the minimum 
intensity in each 44^-4 pfacel block, and subtraOtte 
minimum intensity from each pixel in the 443(y-4 
blodc. Therefore, we can obtain an intensity variation 
in each Mode Then, for each Mode division from the 
fece image, we matdi the blodc with an fliB codes m 
the coddxwk, and the most similar-matched 
codevector is selected. Euclidean distance is used for 
tl^ distance matching. After performing VQ for aU 
ttie blocks divided from a fecial imag^ matched 
fequendes for each codevector are counted and a 
histoi 




Figpie 3. Organization of CoddKwk 



Face image 
Mr 




VQ Histogram 
Figure 2. Face Feature Gweration Procedure 

In Older to generate the VQ Histogram, we forwaid 
the feoe ima^ into a low-pass filter first The low- 
pass filter is used fbr reducing the hi^-fiequenqy 
noise and extracting the most effective low fiequenqr 
component fbr recognition. We tijen divide Uie imagP 



The Oiganization of the codebook we used in our 
implementation is systematically organized witii 33 
oodevedois having monotonic intensity variation. The 
fiisi diirty-two vectois are gpnoated by changing 
directimis and range of inloBsity variation as show m 

Figure Sfi gure-^. The last vector oontauis no 
dir ections and va riation. 




I 



Fignic4. Face VQ Histograms 

There are at least tvro advantages to this method of 
lepresaiting feces wiOi VQ histograms Because the 



esar m^3M^LE copy 



VQ Histogram &ce feature relies on the histogram of 
the VQ code, it ignores the geometry infonnation, 
which makes the VQ KBstogiam feature insensitive to 
the fece position. Also, since the VQ code is generated 
by the gniaii ynmifniim intena^ subtracted 44)y-4 
block, it greatly reduces the eflEect of lightiQg on the 
fice. Figure 4F igui:e 4 shows a VQ-Wstogram 
exanqile. 

5. Frobabflbtic Nearal Networks and 
Modification 

The Probabilistic Neural Netwoiks is one of the 
implementation on Bayes Strategy, which seeking the 
minimum risk cost based on the ProfeaWliCr 
Distribution Function (PDF). The standaid 
Probabilistic Neural Netwoiks contains three layers: 
Iiqmt layer, hidden layer and category layw (see 
gigjyS^^gfflF^), The Inimt wiU nonnali^ 
input vectors and forward the normalized data to 
ffidden Layer. Hidden layer will then add one node in 
hidden l^er for each training and save the weights as 

nonnalized input vectors, (talk about secdon 4.3) 



5.1 Probabilistic Neural Network Trmning 

During training, the probabilistic Neural Netwoiks 
wiU normalize aU the data fiom iiq>ut vectors. Tlie 
Qcmnallzadon calculation is shown below: 

(2) 

where Xn is the iiq)ut vector and XTn is the 
nonnalized input vector. 

During each training phase, PNN adds a new node 
to hidden 1^ and link the new node with all the 
nodes in input 1^- In order to estimate the PDF of 
the each category for classification, the weight 
between the new node and input nodes is assigned 
with the normalized input vectors; in other words, 
save all the examples in the link between input nodes 
and hidden nodes. This wiU make the PNN generate a 

Probability Distribution Function during the 
Caassification phase. Then, link the new node m 
hidden layer to the supervised categoiy. DifTerent to 
Radial Functions Netwoiks, in PNN, the links betwem 
the hidden layer and cat^oiy layer is not fiiUy 
connected and doesnU contains any weights. 
The training alguithm is listed as below[5]: 




Bgpre & Standaid FNN Stroctore 

The category layrer will choose the maximum value 
calculated from hidden layer and calculate the 
Gcmfidenoe measurement based on the values fiom 
hiddenlayer, ^ . 

Section 4,1 and 4.2 hiitroduoe PNN trauung and 
classification and Section 4.3 discuss an adaptive 
thiesbold modification on PNN model 



1. for Jr^,i = l,2,...ii 

2. normalize-Jff: = jc^ • 4^^f k'^AX^.d 

3. Assign wdgjhts: ajf, 

4. if jti6»y thenc^d 

5. end 

where-Xi,i= 1,2, ... ware the iiqjut vectors and rf 
is the number of input dimensions, is the wdght 
vector between input nodes and the new hidden node U 
and is the link between flie new hidden node i and 

category node/ i 
Figure shows an example of PNN | 

training with 2 dimension-irq)ut vectois. In the figure, 

we show increasing of the number of hidden nodes. 

Lrt die training set X consist of four ii^t vectors: 
X^{c. X7 xz X4}, and input vectons {x. xa) are 
si^eivised to output category 7, and input vectors 
^, x*}aresiqjervisedtocategojy2. 
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Figpre 6. FNN Tlraimng Example 

For Oe fiist traming phase shown in ^iism 
^ligBie^a, the training dataXj is normalized into Jf'j 
iwi a new hidden node is added in the hidd^ 
layer. Tlie weigjit of the new hidden node Ht is 
assigned nsing the normalized training data X'i. Then, 
tbe hidden node i?, is linked to the supervised categoiy 
node 0) n which stand for output category 1 ui PNN. 
For the second training phase for training data 
PNN performs the same operations on Xj as the 
opciations on first training data Jr,. As a result of flie 
fiisl and SBCcmd trainirig, the PNN contains two nodes 
in hidden 1^ and thqr are linked to categpry 1 as 
shown in Eigwe6Fig«ie4b. 

Since the third training data is supervised to 
category 2. FNN adds a new node H3 in hidden layer 
and assign the wdghts as normalized training data 
A^'a, but PNN links the new node ft to cat^ory node o> 

2 which is referring the cofcgory 2 as shown in SgyiS 
ms^^^ The PNN performs the same operations 
for traming data X4. Figure 6Figwe-6d shows a 



the PDF, socb as Parzen windows M and Gaussian 

Model. ^ 

During the Classification process, the probabilistic 
Neural Networks normalize the input vector as shown 
in Probabilistic Neural Networics training procedure 
(see section 4.1). Then in the Wdden layer, PNN 
performs a dot product between the normalized input 
data and the saved weights as showedn below: 

whwe the is the normalized iipit vector and Wt 
is the wei^t, Z| is the output value for each no* m 

hiddenlayer. . 
Then PNN pfflforuM the foltowiiig calculation: 

7,=i:^"^exp[(Z,-l)/^l 

(4) 

^vbere the a is the smooth fiictor and Z| is the 
omput yahie for each node in the hidden 1^. 

In categoiy layer. The PNN makes a dassi^ 
dedsion based on the Bayes dedsiim nil^ whidi is 
dunm below in our case: 

d(fi = 0,) if Z',/n,>Z'j/nj \fj*i 

Then, PNN ealcnlales the confidence measurement 
led on the PDF output ftom the hidden layer. The 



52 Probabilistic Neural Networks Classification 
Because PNN saves aU the uafimmtion firom the 
examples in the hidden bfyer during training, wji 
some densi^ estimatois, PNN can ^eiate a PDF BT 
aU the trained cate^es based on this saved 
iafinmation. Ttoe are multqtfe methods to estimate 



nodes in categmy layer peiforms the fiaOowing 
calculation: 

^^^ere the G is the confidence measurement for 
each category, n, is number of hidden nodes for 



5.3 Adaptive Thi«shold in Probabilistic Neural 
Networlis 

In the previous section, we introduced PNN 
dasafication procedure. As we mentioned, the PNN 
can generate the confidence measurements for all 
categories; however, ftom formula (5), these 
confidence measurements are generated by convarmg 
the relative results ftom the PDF output of the saved 
exan«)les. Since PNN compares the relative results, 
sometimes PNN migiht generate a high confidence 
output for one cate^ry evOT though the ouq?ut ftom 
PDF for this category is very bw. 

An example can make the above idea more clear 
Figure 7P tCT»-? shows a trained PNN with one- 
dimension irpn vector. The PDF is generated based 
on the saved exanq>les in the hidden laiyer. In this 



case, the iiq)ut date Jfi is classified as cate^iy w^, and 
the confidence is calculated by Fonniila (5). Based on 
the probability fiom the PDF on both cat^oiy wj and 
W2, p(3rjW=a/ and pCxM-^OM which generates 
confidence for input <totajr, of 83%. Also, based on 
the Fwmula (5), the iopot data Xj is dassified as 
category ^2 with 66% ctf confidence and input data ATj 
is classified as category wi with 80% of confidence. 




I im.63% 1 



fi8Pi« 7. PDF Analysb in PNN: Counter-btuitlvc 
confldence problem 

The pn*lem is that the input Xt obtains 83% of 
confidence, which is larger than 66% of confidence for 
input X2, however, the input X2 is closer to the me^ 
of the PDF of category w,, which means input X3 
should obtain a higher confidence than input Jf/. It is 
even tridder to explain thattheirqnitAihas 80%af 
confidence and in leali^ cannot provide an 80% 

We introduce a thicshdd in category l^r of PNN 
to solve the above counter-intuitive confidence 
problem, which avoids the smaU PDF ou^ The 
cat^oiy threshold implies a low threshold result in 
more patterns being declared unknown and an 



threshold Tte ad^tive threshold is chosen based on 
the percentage of the value of PDFs, which means it 
calculates the maximum of PDF firom examples and 
for each category a threshold is set based on the their 
maximum PDF value. As a result, each category ^ 
have its own threshold After aiiy category is selected 

as classification output, which means its PDF output is 
the maximum, its PDF ou^wt will be compared with 
its threshold. Only when the category with PDF output 
is larger than its own threshold, this category can be 

selected as classification output Figure 8FtgBre-8 | 
shows an updated example with adaptive threshold. 



We Tq>date the formula (5) so that U could identify 
the pattern with low PDF outputs. Formula (7) shows 
an updated Bayes decision rule with the ability to 



(7) 



d{0 ^ mknoym) iSZ\ tn^ > Z'j In^ < i, 
y/j^i^ and tt is the threshold for category i 
Because the PDFs for all categories are different in 
mean, deviation and maximum oonffitional probabili^ 
value, a sialic thresliold mi^t arbitrary erase the right 
classifier result, which contains low PDF output 
Thercfbre, in order to avoid this problem, different to 
Ob tedmique in [3], we devdoped an adaptive 



ci for lift 




Figprv 8. Adaptive Threshold in PDFs 
The ftreshold in figure 8 is set to 70% of the 
maximum PDF value, and the tl is the threshold for 
category 1; t2 is the threshold for category 2. For mput 
vector xl, the /»(St,W is equal to a/, which is 
maximum vahie among the other category PDF 
vahies, however, />(Sr,M is stiO lower than the 70% of 
maximum PDF value. Ttius, ii^iut vector xl is 
dassified as unknown cate^iy. For input vector x2, 
tiie maximum PDF output among all categories is 
«(3c^|wi;,which is equal to 0.4. The pMw^ is lai^er 
than 70% of the maxinnun PDF value. PNN then 
dassify the input vector X2 as category 1, PNN 
perform the same procedure for input vector X3, and 
because the PDF output for vector X3 is lower than the 
maximum vahie of category 2, PNN classify vector X3 
as unknown category agaiiL 

TTie result to set thresholds in category layer is to 
avoid the low confidence classification ou^uts. Also, 
as a result of avoiding the low confidence 
classification, this iin>lcmcntatlon can detect the 
unknown categories, whidi can lower the false alarm 
of &ce recognition and detect the unknown feces in 
our case: 



6. Detecting and Online Learning New 
Faces 

SecUon 4 introduced an Adaptive Threshold 
Probabilistic Neural Networks (ATPNN) with the 
abiUty to recognize the pattern and identify the 
unknown pattern. Using the ficc features we 
introduced in secdon 3 as ATPNN input, and trauung 
the ATPNN with enough fece training data, the 
ATPNN can be set up as a face classifier. The ATPNN 
based fece classifi» can recogni2« the known fece and 

identify the unknown face as well. 

We introduce rules to detect the unknown feces for 
videos based on ATPNN face classifier in section 5.1. 
Section 5.2 introduces a method to learn online the 
new ftos in ATPNN. 

New Face Detection in Videos 
Although die ATPNN can identify the unknown 
feces, the performance of the ATPNN fece classifier 
for unknown fece images is not satisfectory. Based on 
our experience, the recall rate of the unknown fece 
idailification is low with higher Uueshold, and the 
felse alarm rate of unknown fece identification is high 
^th lower threshold. This means dial we eidier miss 
the unknown feces with a higher flireshold or felsely 
detect the known feces as unknown feces with a Iowct 
threshold. Therefore, a post-analysis on ATPNN based 
fece classifier is neoessaiy for detecting the unknown 



The solution for detecting the new feces takes 
advantages of the use of ATPNN and the temporal 
nature of videos. The ATPNN can provide flie 
identification of unknown feces exactly due to lower 
recall ratcorhigherfelseaferm. Asopposed to fece 

images, die video containing feces can provide not 
only fece images but also fece sequences in time 

series. Therefore, we design several conditions to 
detect new feces, which utilize the advantages of 
ATPNN and videos. 
The conditions to detect a new faoe are shown 
belowi 

1. ATPNN fece classifier identifies the fece as 
unknown fece 

2. Mean of die PDF output is low 

3. Variance of the input vectors is small 

4! All the above tinee conditions last for 10 



The condition 1 identifies tiie input fece as an 
uriknown fece, and condition 2 evaluates die mean of 
the PDF output in flie face sequence. Condition 3 
calculates Uie distance by performing die standard 
deviation on tiie input vectors sequence in order to 



make sure die input vectors are for the same fece. If 
all tiiree conditions are met within a 10 seconds video 
clip, we concluded that a new fece has appeared m 

flievideo. ^, . . . _ 

The algoiitinn for the above condiboas is shown 

below: 

if d($k=^nknown) then 
save Xk and Zt Into buffer 
If ilzeof Cbuffer) > & 
/ mean of zk < th & 
/ mean of xk < thx then 
New Face Found 
00 online Learn 
endlf 



step !• For face frame Ar, from atpnm, 
if d(0i,= unknown) 

- save input vector Xk and PDF output Zk 
to buffer. 

- GO to Step 2. 

^^"clear buffer, go to next frame k+1 

- GO to step 1. 

step 2. in the buffer, . * 
if buffer size > 10 second * 24 

- calculate the mean of the PDF outpun 

^ - calculate the variance of the input 
vectors xk 

- GO to step 3 
el se 

- GO to next frame k+1 

- GO to step 1 

^li'^the mean and variance is low . 

- A niSn l^ce found, do online learn for 
new face ^ ^ 

- clear buffer 

- GO to step 1 
alse ^ _ 

- clear buffer 

- GO to Step 1 

With me above algorithm, die accuracy of die new 
fece detection becomes very good. Also, because it 
keqis track the fece sequence widiin 10 seconds, we 
can avoid the random feces, which happened to be 
^own in the video. 

6.2. Faces Online Learning 

Once die algoiidim detects a new fece, die onlme 
learning of die new fece is performed. The advantage 
of PNN is diat we do not need to iqidate all die odier 
weights during training [4]. This aUows online 
learning widiout too many calculations during the 



updating 01 weiguui. 

As we described in section S.l, we store fece input 
vectors in the buffer and we evahiate die variance and 
mean of diese input vectors, in die bufifer, die lower 



variance inpat vectors contain more predse 
infonnation of the new &ce. 

We choose 10 input vectmsJifi in tbe buffer, whicb 
contain the low variance ftomaDttebqrat Then, Uie 
PNN learning algorithm is performed for the new 
iiqmt vectois. The procedure of the online training is 
ahnost the same as ofif-line training; oonnalize the 
input vector Xt with formula (2). Add a new node into 
hidden lay^ I and assign the weights wifli normalized 
iiqnit vector JfV Then, add a new category ow in 
categoiy 1^ and link the hidden node / vnth the new 

flie aiomthm Tm online leamins is shown Delow: 
1. 
2. 

3. 
4. 

5. — . 
where A} is the iiq)Ut vectors and is the number of 
dimension^ FPi is the weight vector between 
ii^ nodes and the new hidden node /, and is the 
link between hidden node i and the new category node 

Figure 9i feBre-9a shows a trained ATPNN with 2 
&ces in database. In this diagram, each hidden node is 
i^resented Iqr a fecc because the nodes save the 
information of the fece during training. Figwe 
9FlgBm4»b shows a PNN after the online learning for 
a d^ected new feoe. In this diagram, the nodes in 
hidden layer uacreased and the information for new 
fiices is added into the hidden la^. 



for Jiri,i«;.2,,..,i0 
nonnalize JTr. » • ^T^^t * = 
assign weig^ W,^X\ 

end 




TomCknise juU&OBe Moore NowFace 
Ilgpre 9. PNN Online LeamingflExtenrion 
The information of new face will be stored in 
hidden layw" and categoiy layer of Probabilistic Neural 
Networics. TTicrefore, when the "new fece" ^ipears 
again, the probabilistic Neural Networics will 
leoognize this foce as known foce. | 

7. Examples 

In cur implementation, we test the al^nthm with 

a ^oiamic threshold. If the value of r mfonnula 4 | 
bdow the thfcsihold, we suKWse it is an unknown feoe 
and the PNN will output an unknown fece ID result 
[Generate the curve based on different Th in Mattab] , 

The training face is list below: 

I Tom Cnuse 



uuuuu 



Julianne Moore 



Unknown 



We tested the algorithm on the Movies ''N&ignolia" 
(1999) and ''Minority Report" (2002). The reason for 
choosing these two movies is that Tom Cniiseqipears 



BEST CO?Y 



in both of them as the main actor. We can train owr 
TON based on the Tom Cruise's feces in "Magnolia" 
and use this PNN model to recognize Tom Ciuise*s 
feoein^KGnori^Repoif. 

Have some dia^am to show the detected Tiew 
feoes^inhere 

8, Implemeotation and Results 

We have a test with different threshold in Movie 
"Magnolia", the first column is generated by 
Maximum threshold, and the last column is generated 

by threshold equal to 0.09. 




Rgpre 10 Test results with Static Threshold 

Test with adaptive threshold. 
And test result ftr online learning. 

|TO BE IX)1^: Here we wiU indude ROC curves and 
other results] 



9. Condosion 

The y"ain idea is to recognize known feces, delect 
mrisaoown feces and apply automatic online learning 
for unknown feces in video. After the online learning, 
our Classifier could recognize the new (unknown) 
feces presented before. After the recognition, the 
Oassifier vwU assign recognized fece IDs to the feces. 
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ABSTRACT 

Video retrieval in consumer applications demands high 
level semantic descriptors such as people's identity. The 
problem is that in a variety of videos such as home videos^ 
Hollywood content, TV broadcast content, mobile phone 
videos faces are not easy to recognize. Even more, a 
closed system trained to recognize only a predetermined 
number of faces Ynll become obsolete very easily We 
developed an online-learning fece recognition system for a 
variety of videos based on Adaptive Threshold 
Probabilistic Neural Networks (ATPNN). . This fece 
recognition system can detect and recognize known fac^, 
as well as automatically detect unknown &ces and train 
the unknown feces online into new face classifiers such 
that this "unknown face*" can be recognized if it appears 
again. MPNN is a variant of PNN with ♦ adaptive 
thresholding, on the category (output) layer of 
aProbabilistic Neural Network (PNN) in order to detect, 
unknown categories of input data. The PNN feed-forward 
training makes the online training very &st because adding 
new feces does not require retraining of die known 
categories. Our results show that off-line and on-line 
learning yield equivalent results. The real added benefits 
are: I) we can build open systems, and 2) PNN makes 
because there is no retraining require for known feces. 

1. INTRODUCTION 
Most fece recognition systems are trained on a fixed 
number of feces that are known in advance. These systems 
will only recognize die feces witfi known models and the 
fece database cannot be updated during die classification 
pxocedore. In (his re^)ect diese systems arc veiylimited 
once diey are placed in operation. For example, diey will 
work for surveillance systems, which have to recognize all 
employees of a conqjany and alert to any intmders, or an 
airport surveillance system diat is trained to recognize 
known terrorists. However, in die area of home video, TV 
broadcast video, wearable video, m addition to die known 
people diere is a need to recognize die new people 
appearing with each new video. In home videos for 
example, if a system is tramed to recognize only femfly 
members dicn a visitor is labeled as "odier*' or •'unknown". 



Of couree diere are travel videos with many new feces that 
are transient A system that categorizes images and videos 
based on people presence has to distinguish all tiiese 
categories of important and unimportant feces. Moreover, 
the system has to be flexible enough to incoipoiate and 
retain important feces. 

Our approach can automatically detect die new faces 
and extend die database based on die new feces. Our 
online learning system can learn features of new, 
reoccuning faces and store corresponding models of new 
feces for fiiture use. Also, our approach can generate a 
confidence measurement for each recognized, fece in die 
•database and sort die candidate by the confidence 
measurement, which make post-processing easier. 

2. SYSTEM ARCHITECTURE 
Figure 1 shows our fece recognition system architecture. 
There are two approaches to bootstrap die system: 1) 
Initial database has a limited number of feces, and 2) 
Initial database is empty. If die system is first trained on 
the initial database, we can gain higher recognition 
accuracy on die initial database. This mediod is similar to 
our human perception of known feces and incorporation of 
new feces. The system has a training phase and 
classification phase just like any odier fece recognition 
system (depicted widi a dashed line). However, die 
important aspect here is diat there is a feedback arrow to 
the training fiice for unknown feces. The persistent 
(reoccuning feces become new sample feces for die 
online traming (dotted ellipse in figure 1). 

During die training phase, die system reads fece 
exan^les for each fece (actor/character) and trains die 
Probabilistic Neural Networks (PNN) I4][2] based on the 
features of diese feces We choose Vector (Juantization 
Histogram features as fece features [1]. During die 
classification phase, the system will decode the MPEG 
video file mto video frames first For each fiame, we use a 
variant of die fece detector described in [8]. If diere is a 
fece found by the fece detector, die fece segment is 
forwarded to die PNN based Face Classifier. A confidence 
measurement for each fece ID is generated by PNN. Based 
on an adaptive diresholding of die confidence values and a 
set of conditions the system determines if die fece is 
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known or unknown. Persistent unknown feces are 
evaluated and forwarded to online learning phase. After 
we have the confidence measurement for each Face ID, we 
can easily choose the Face ID with the maximum 
confidence measurement as the output from Face 
Classifier by using a Winner takes all principle. 



Video Frames 



Initial Sample 
Faces {oDiional) 




Face ID 



ample 
Faces > 
^pnltne Trainirjg 



Figure 1. Face Recognition System Architecture 

3. FACE DETECTION 
This section briefly describes the face detection algorithm 
used in our fiamework. In [8], Viola and Jones appUed the 
popular AdaBoost [12] learning technique to the problem 
of rapid object detection. They used an atlentional cascade 
of strong classifiers that consisted of a set of 
computationally efficient binary features (also called weak 
classifiere). Each round t of boosting added a single 
feature h* to the current set of features by minimizing: 
Zr =SA(Oexp(-a,y,A.(JC/)) 

where Dt(i) is the weight on exanq>lexi at round t^yi e [-1, 
1] is the target 1^1 of Ac example, Ot is the influence of 
this weak hypothesis on &e strong classifier and h,0 i& the 
weakbinaiy hypothesis restricted to [-1, l]. In our variant, 
we use boosting stwoops (decision trees that partition the 
domain into two pieces and yield a prediction for each 
pattition) as the weak classifiers, which results in a, being 
folded into h^ thereby allowing the weak hypotheses to 
have a range over aU « rather dian flie restricted range [- 1 . 
+1]. The prediction values for Ae left and right partitions 
that minimize Zt above are: 



ain(5?;±£) 



where die W's denote fhe weight of the examples that 
are assigned to the left or right partition with true labels 
••positive" or **negative". The predictions are also 
smoothed with die term c to avoid numerical problems 



caused by large predictions. From these prediction values, 
we can greedily choose the splitting criterion for the 
decision tree (dr opping th^ sub script t) as 

Z = 2(7»^W^ + ^/W^'^H^.''**' ) 
rather than the Gini index or an entropic function [ 1 2], 
A few variants [9][1 1] of the learning algorithm described 
in [8] have been proposed rccendy. These algorithms 
reduce the training error (i.e. enor in die training set) 
during training and count on the generalization 
performance of AdaBoost that is rigorously proved in 
[12]. It is our experience that using a validation set during 
training as in [8][10] yields the most effective cascades 
with fewer features. This is due to the fact that we get 
multiple hits around each face v*ile scanning the 
validation set and we can pick the strong classifier 
threshold as high as possible in order to retain just one hit, 
thereby eliminating more felse alarms in the process. 
However, one must ensure that diis direshold is not chosen 
too high so as to miss too many positive training images. 
In addition, we just scan the validation set once (raflier 
than several times as in [10]) to adjust the strong classifier 
direshold as each weak classifier is added to the current 
cascade. We do diis by keeping track of die rectangles and 
their corresponding last stage sums diat pass through all 
but die penultimate stage of die current cascade (for the 
first stage, this amounts to keeping track of all rectangles 
scanned and dieir corresponding sums). We use around 
4000 positive samples and 5000 negative samples for 
training each stage of die cascade ^ere die negative 
sanq)les for each stage are die felse positives obtained by 
scanning the current cascade on an image set widi no 
feces Our validation set consists of around 200 faces. 

4. ONLINE FACE RECOGNITION 
This section describes die online fece recognition 
algoridim used in our framework. Firsdy, we introduce a 
face classifier based on Adaptive TTueshold Probabilistic 
Neural Networks (ATPNN), w^ch develop firom 
Probabilistic NeuialNetworks [4]. We have two reasons 
for choosing PNN as our &ce classifiei;: . 1) we can 
measure die ou4>uts confidence based on die preset 
direshold, and 2)PNN is a feed forward training model, 
which means it is not necessary to train die existing links 
in PNN when adding new node in the category (output) 
layer. Then, we introduce die conditions we used for new 
fiice detecting and the online learning algorithm for new 
ftices in die later sub-sections. 

4.1. Adaptive Threshold Probabilistic Neural Networks 
The ATPNN is developed fiom Probabilistic Neural 
Networks, which Specht, D J first introduced in [4]. The 
PNN is one of the inq>leinentation on Bayes Strategy, 
vdiich seeking die minhnum'risk cost based on die 
Probability Distribution Function (PDF). The Bayes 
Decision rule used in PNN is shown below: 

ESST AVW-O'JE COPY 
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d{e^e,) \fz,>Zj \fj^i (1) 

where 0 is the output category, 2 is the PDF output of 
each input vector. 

The PNN can generate a confidence measurement by 
comparing the relative results from the PDF output of the 
saved examples. However, this makes PNN generate a 
high confidence output for one category even though the 
output from PDF for this category is veiy low. For 
example, in Figure 2/a trained PNN with one-dimension 
input vector is shown. The PDF is generated based on the 
saved examples in the hidden layer. Without the thresholds 
/, the input vector xl can be classified as category wl, 
with around 80% high confidence. However, the PDF 
output for die input vecter xl is low enough to be 
identified as unknown categoxy. 

By adding a threshold in the category layer of PNN, the 
PNN can identify unknown categories [3] and also avoid 
identifying low PDF output vectors. After the 
modification, the Bayes Decision rule is updated as below: 

d{e^unkno\m) i{Z,>Zj<t Vy^t/ (2) 

where t is the threshold in category layer. 

The ATPNN based face classifier can recognize the 
known face and identify the unknown face as well. If 
ouQ)uts for all &ces (nodes in the category layer) are 
below die threshold, we assume the input &ce is an 
unknown face. Figure 2 shows the strategy 




Figure 2. Adaptive Threshold for PDFs hi ATPNN 

4.2. New Face Detection hi Videos 

The solution for detecting the new feces takss 
.advantage of the use of ATPNN and the temporal nature 
of videos. The ATPNN can provide the identification of 
unknown feces exactly due to lower recall rate or higher 
felse alann. As opposed to fece images, the video 
containing feces can provide not only fece images but also 
fece sequences in time series. Therefore, we design several 
conditions to detect new faces, viiich utilize the 
advantages of ATPNN and videos. 

The conditions to detect a new fece are shown below: 



1. ATPNN fece classifier identifies the fece as 
unknown fece 

2. Mean of the PDF output is low 

3. Variance of the input vectors is small 

4. All the above three conditions last for n (e.g. n=10) 
seconds 

The condition 1 identifies the input face as an unknown 
face, and condition 2 evaluates the mean of the PDF 
output in the face sequence. Condition 3 calculates the 
distance by performing the standard deviation on the input 
vectors sequence in order to make sure the input vectors 
are for the same face. If all three conditions are met wiUiin 
the n seconds video clip, we concluded diat a new face has 
appeared in the video. This is a simple use of **memoiy". 
However if a face appears many times in a video for very 
short periods (high-cut rate in a conversation for instance) 
then we need to employ accumtilative memory w here a 
fece of a stranger is learned over time (e.g. reapeating 
feces in home video that appear at different social 
gatherings. 

43. Faces Online Learning 

Once the algorithm detects a new fece, the online learning 
of the new fece is performed [what method of on-line 
learning is used?]. The advantage ofPNNis that wedo 
not need to update all the other weights during training 
. [4]. This allows online learning without too many 
calculations during the updating of weights.ok! 

As we described in section 5.1, we store fece input 
vectors in the buffer and we evaluate the variance and 
mean of these input vectors. In the buffer, the lower 
variance input vectors contain more precise information of 
the new fece. 

We choose 10 input vectors Xi in the buffer, which 
contain the low variance firom all the inputs (i.e. the 
closest closest to the average in the buffer). Then, the 
PNN learning algorithm is performed for the new input 
vectors. The procedure of the online training is almost the 
same as off-line training: normalize the input vector X{ 
with foimida (2) For every XJ add a new node into tibe 
hiddenlayerandinitialzethe weights of the node to die 
normalized iapot vector X'i.. Then^ add .a new .category 
. co^ in the category feyer and link the added hidden nodes 
to die new category cdbew* 
The algorithm for online learning is shown below: 

1. for ..../(? 

2. nonnalize;^,^'^--''-/!^ * = 

3. assign weights: 
4. 

5. end 

where A} is the input vectqrs and d is die number of iiq)ut 
dimensions, Wi is the weight vector between uqwt nodes 
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and the new hidden node /, and Cji is the link between 
hidden node i and the new categoiy node J. 

Figure 3a shows a trained ATPNN with 2 faces in 
database. In this diagram, each hidden node is represented 
by a face because the nodes save the information of (he 
lace during training. Figure 3b shows a PNN after the 
online learning for a detected new face. In this diagram, 
the nodes in hidden layer increased and the infonnation 
for new faces is added into the hidden layer. 




Tom Cmlse Julianne Moore 




Tom Cruise Julianne Moore New Face 



Figure 3. Adding a new face using PNN Online 
Learning/Extension 

The infonnation about the new &ce will be stored in 
hidden layer and categoiy layer of the Probabilistic Neural 
Network. Therefore, when ttie *^ew fitce*^ appears again, 
the probabilistic Neural Networks will recognize this &ce 
as known &ce. Here of course it is possible to continue 
onHne training to reinforce the new &ce classifier. 
However ^ploring how much weight will a new &ce 
exanq>le have in re-^iforcmg die classifier is really part of 
our future research. 

5. IMPLEMENTATION AND RESULTS 
We tested, 4 genres of videos:, movies, News Video, 
Video Conference and Home Video. The experimental 
resaltare shown in Table 1. 



Video 


Min 


#of 


Offline 


Detect 


Online 


Categoiy 




Faces 


Hit 


FP 


ed 


Hh 


FP 


Movies 


303 


39 


82 


19 


30 


77 


32 








% 


% 




% 


% 


News 


22 


24 


93 


27 


9 


81 


18 








% 


% 




% 


% 


Confer- 


45 


6 


91 


5 


6 


90 


6% 


ence 






% 


% 




% 




Home 


28 


6 


74 


24 


2 


52 


45 



I Video I I I % I % I I % I % I 

Table 1. Experiments result on different genre 

In Table 1, the second column labeled as "Min" refers 
to the minutes of the videos in total, the third column 
labeled as 'Taces" refers to the number of detectable 
feces. "Hit" means the hit ratio and "FP" means die felse 
positive. The column labeled as ''Online" means the 
number of faces that has been detected and online trained 
in PNN. We should note here that by design we chose to 
detect new faces that are persistent for more than 10 
seconds. 

From Table 1, we see diere are 24 feces in the 22 
minutes News video, however, the algorithm is instructed 
to only leam online 9 feces out of 22 feces. This is 
because most of the new faces in News video are short in 
length, and the algoridim ignores the new fece before 
adding it into fece database For the movies we during the 
online learning experiment, we initialize the training set 
widi 4 actors and for each actor, we used 5 face samples. 



Movie 


Actor 


Offline 


Online 


Hit 


FP 


Hit 


FP 


Magnolia 


Tom 
Cruise 


86% 


7% 


81% 


11% 


Julianne 
Moore 


91% 


15% 


72% 


26% 


Philip B. 
Hall 


67% 


19% 


65% 


24% 


Jeremy 
Blackman 


81% 


11% 


69% 


16% 



Table 2. Experiment results for particular actors 

Table 2 shows die experiment results for particular 
actors. Recognition result of off-line learning and on-line 
learning are shown in this table. We can find the result of 
Online learning is comparable to the offline learning 
result 

6. CONCLUSIONS 
Open systems for detection and recognition of high level 
semantic descriptors are going to be increasingly valuable 
in consumer's world of multimedia content e?q>losion. 
Once in operation the system should be able to leam and 
adapt just like babies leam with time to recognize the 
feces of flieir parents, close relatives, finends and keep 
expanding. In this paper we introduced an online fece 
recognition system Uiat uses a variant of the Probabilistic 
Neural Networks. The main goal is to recognize known 
feces, detect unknown feces and apply automatic online 
learning for unknown feces in video. After Ihe online 
learmng, our Classifier could recognize the new 
(unknown) feces presented before. After the recognition, 
the Classifier will assign recognized face IDs to the feces. 
Our initial results are very promising. 
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: In the foture we would like to explore this concept further 
i to include intermittently persistent feces (e.g. a 
presidential candidate is shown more often until he/she 
i becomes important). There are different forms of memoiy 
! that can enrich the system to attain more human-like 
recognition capabilities. 
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Re Item V. 

1 . Reference is made to the following document: 

D1 • RAGINI CHOUDHURY VERMA ET AL: "FACE DETECTION AND TRACKING 
IN A VIDEO BY PROPAGATING DETECTION PROBABILITIES" IEEE 
TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE, 
IEEE INC. NEW YORK, US, vol. 25, no. 10. October 2003, pages 1215-1228 

D2- GONGS. McKENNA S.J. AND PSARROU A.: "DYNAMIC VISION, FROM 
' IMAGES TO FACE RECOGNITION", IIMPERIAL COLLEGE PRESS, 2000 



2. CLAIMS 1 and 13 



2.1 Claim 1 



Document D1 , which is considered to represent the most relevant state of the art, 
discloses (the references in parentheses applying to this document): 

A system having a face classifier that provides a determination that a face in a video 
input is an unknown face if it fails to correspond to any known face stored in the 
classifier, the system then adding the face to the classifier. 

From this, the subject-matter of independent claim 1 differs in that a persistence 
criterion is further included prior to adding a new face to the classifier 

The problem to be solved by the present invention may therefore be regarded as how 
to prevent that "spurious" or "fleeting" faces "cf . description page 2, lines 10-16) from 
an input video sequence are added to a face classifier. 

The solution to said problem consisting in including a persistence criterion is merely 
one of several straightforward possibinties from which the skilled person would se ect. 
in accordance with circumstances, without the exercise of inventive skill, in order to 
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solve the problem of preventing "spurious" faces to be added to the classifier (cf. D2 
page 1 35, lines 24-27 and page 245. lines 22-27). 

The solution proposed in claim 1 of the present application cannot therefore be 
considered as involving an inventive step (Article 33(3) PCT). 

2.2 Claim 13 

2 21 The subject-matter of method claim 1 3 merely relate to a set of intellectual 
steps which are covered by the provision of Rule 67(1 )(iii) PCT. Although the 
examining division is not under the obligation to formulate an opinion with 
respect to the subject-matter of this claim (Article 34(4)(a)(i) PCT). the following 
provisional opinion is nevertheless established: 

2 2 2 Claim 1 3 corresponds mutatis mutandis in terms of method to independent 
system claim 1 . Claim 1 3 is therefore also considered as lacking an inventive 
step (Article 33(3) PCT). 



3. CLAIMS 2-6 9, 11, 12, 14, 15 and 17-24 

Dependent claims 2-6, 9, 11 , 12. 14. 15 and 17-24 and do not contain.any features 
which, in combination with the features of any claim to which they refer, meet he 
requirements of the PCT in respect of inventive step, the reasons being as follows: 

3.1 Claims 2-4, 9 and 10 

The feature of claims 2-4, 9 and 10 consisting in using a Probabilistic Neural Network 
(PNN) as a classifier is one of several straightforward possibilities from which the 
skilled person would select, in accordance with circumstances, without the exercise of 
Inventive skill, in order to solve the problem of selecting a classifier for identifying 
whether a face is known or not in a video sequence. 

3.2 Claim 5. 14 and 22 



Form PCT/ISA«37 (Separate Sheet) (Sheet 2) (EPO>)anuary 2004) 



WRITTEN OPINION OF THE 
INTERNATIONAL SEARCHING 
AUTHORITY (SEPARATE SHEET) 



International application No. 
PCT/IB2005/050399 



The feature of Claim 5, 1 4 and 22 is also disclosed in document as providing the 
same advantages as In the present application (page 135, lines 24-27 and page 245, 
lines 22-27). The skilled person would therefore regard it as a normal option to 
include this feature in the system described in document D1 in order to solve the 
problem of preventing "spurious" faces to be added to the classifier. 

3.3 Claim 6, 15 

The feature consisting in tracking the face in the video input is also disclosed in 
document D1 (section 2.1). 

3.4 Claim 11 

The feature of claim 1 1 is also disclosed in document D1 (page 433, left column, 
lines 3-6). 

3.5 Claim 12 

The feature of claim 12 is also disclosed in document D1 (page 433, left column, 
lines 53-61).. 

3.6 Claim 17 

The feature of Claim 17 is one of several straightfonward possibilities from which the 
skilled person would select, in accordance with circumstances, without the exerdse of 
inventive skill, in order to solve the problem of selecting a classifier for identifying 
whether a face is known or not in a video sequence. 

3.7 Claims 18-24 

The additional featured of claims 18-24 are also disclosed in document D1 (sections 
2.1 and 2.2). 
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4. CLAIMS 7 AND 16 

The solution to the problem of deciding whether an unknown face detected in a 
sequence of images from an input video should be added to a classifier based on the 
persistence criterion proposed In claims 7 and 16 of the present application is 
considered as involving an Inventive step (Article 33(3) POT) for the following 
reasons: 

Calculating the persistence of a face in a sequence of images on the basis that the 
following criteria: 

(1) detection of a sequence of unknown face by a PNN, 
(il) mean PDF of features vectors Is below a first predetermined threshold, and 
(iii) the variance of feature vectors for the said sequence of faces is below a 
second predetermined threshold. 

are satisfied for a minimum period of time unknown is considered as providing an 
improved method for preventing that spurious unknown faces are Incrementally 
added to the classifier 

Such a solution is not known from, nor suggested by the available prior art. 



5. FURTHER REMARKS 

5.1 Contrary to the requirements of Rule 5.1 (a)(ii) PCT, the relevant background art 
disclosed in the documents D1 and D2 is not mentioned In the description, nor are 
these documents identified therein. 
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Re Item VIII. 

1. CLAIMS 1,13 AND 19 

The wording "persistence criteria" used In claims 1,13 and 1 9 is unclear and leaves 
the reader in doubt as to the meaning of the technical feature to which it refers, 
thereby rendering the definition of the subject-matter of said claims unclear. Article 6 
POT. 

A way to alleviate this objection would be to include the features of claims 5 and 7 to 
independent system claim 1 , and their corresponding method features to 
independent method claim 13. 

2. CLAIM 19 

Claim 19, although drafted as an independent system claim, contains all the features 
of claim 1. Claim 19 is therefore not appropriately formulated as a claim dependent 
on the latter (Rule 6.4 POT). 

Furthermore, the wording "prominence criteria" used in claim 1 9 is vague, thereby 
rendering the definition of the subject-matter of said claim unclear, Article 6 PCX. 

3. CLAIMS 5 AND 14 

The wording "the same unknown face is present in the video input for a minimum 
period of time" used in claims 5 and 14 is vague and leaves the reader in doubt as to 
the meaning of the technical feature to which it refers, thereby rendering the definition 
of the subject-matter of said claims unclear, Article 6 PCT. 

Said "minimum period of time" could indeed be inferior to the interval between two 
successive frames in the input v^deo sequence, meaning that the system would not 
discriminate "spurious" faces appearing only in one single frame. A way to alleviate 
this objection would be to indicate that the the unknown face is present for at least a 
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minimum number of frames (or images) of the Input vidfeo sequence (as specified in 
Claim 22), said number being superior to one (cf. description page 18 line 13 - page 
20, line 3). 
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